5/1/2019

Introduction

Solar Photovoltaic (PV) is becoming more popular thanks to its environmental benefits and reduced cost. The world is going through a transformation to this cleaner and more efficient power source. The DeepSolar Project (Yu et al., 2018) is a complete database with accurate locations and size information of PV installation information for the contiguous U.S.

The dataset has 169 features and 72537 observations, each of which is a region with unique FIPS number. The features can be classified into eight big categories: identity, response, geography, natural conditions, demography, government policy, lifestyle and others.

Goal of this project

Using visualization tools and statistical learning techniques, this project aims to reveal the most important factors for a region where PV can prevail. Based on our findings, suggestions for development of both residential and nonresidential (PV farms) are proposed.

libraries used

  • leaflet (visualization)
  • glmnet
  • rgdal
  • readr
  • dplyr

Non-residential PV (PV farm)

According to Determinant Factors in Site Selection for Photovoltaic Project: A Systematic Review, there are 28 justified factors. Although DeepSolar (our dataset) does not have all these factors, a few important climate factors and population density are covered.

Influential factors for PV farms

  • Climate factors
    • Solar radiation (postive)
    • Air temperature (negative)
    • Wind speed (negative)
    • Relative humidity (negative)
    • Water area (percentage) (negative)
  • Economical factors
    • Population density (positive)
    • Electricity consumption (positive)
    • Elevation (negative)

Influential factors for residential PV

We used Ridge and Lasso Regression to select those influential features to residential PV development. After combining some features, we had three most influential factors:

  • Residential factors
    • senior rate (positively associated)
    • higher education rate (postively associated)
    • black rate (postively associated)

Influential factors

climate factors

 air_temperature relative_humidity daily_solar_radiation   wind_speed   
 Min.   : 1.90   Min.   :0.3280    Min.   :3.300         Min.   :2.800  
 1st Qu.: 9.80   1st Qu.:0.6400    1st Qu.:3.780         1st Qu.:3.600  
 Median :12.90   Median :0.6720    Median :4.110         Median :4.200  
 Mean   :13.44   Mean   :0.6433    Mean   :4.254         Mean   :4.124  
 3rd Qu.:17.10   3rd Qu.:0.6920    3rd Qu.:4.610         3rd Qu.:4.600  
 Max.   :24.80   Max.   :0.8020    Max.   :5.680         Max.   :6.600  
 water_area_perc  
 Min.   :0.00000  
 1st Qu.:0.00000  
 Median :0.00226  
 Mean   :0.03512  
 3rd Qu.:0.01857  
 Max.   :0.99769  

Influential factors

economical factors

   elevation      population_density electricity_consume_total
 Min.   :   1.0   Min.   :     0.0   Min.   :   20802         
 1st Qu.: 125.0   1st Qu.:   338.8   1st Qu.:   73369         
 Median : 237.0   Median :  2303.3   Median :  137586         
 Mean   : 346.2   Mean   :  5563.2   Mean   :  257644         
 3rd Qu.: 366.0   3rd Qu.:  5508.0   3rd Qu.:  231624         
 Max.   :2676.0   Max.   :454706.9   Max.   :19880440         

Influential factors

Residential factors

  senior_rate      higher_education_rate   black_rate      
 Min.   :0.00000   Min.   :0.0000        Min.   :0.000000  
 1st Qu.:0.09894   1st Qu.:0.3233        1st Qu.:0.008977  
 Median :0.14001   Median :0.3863        Median :0.041075  
 Mean   :0.14806   Mean   :0.3791        Mean   :0.138875  
 3rd Qu.:0.18214   3rd Qu.:0.4400        3rd Qu.:0.153037  
 Max.   :1.00000   Max.   :1.0000        Max.   :1.000000  

Clustering

We want to cluster these regions into three clusters:

  • Most suitable for PV development, in terms of natural condition, cost of PV farm and residential PV.
  • Not as suitable as the first, but good enough for a backup
  • Last choice for considering a PV farm–expensive and low return

The variables' magnitudes vary significantly, so a standardization will be required.

Scatterplot for climate factors

Scatterplot for climate factors

  • Solar radiation (postive)
  • Air temperature (negative)
  • Wind speed (negative)
  • Relative humidity (negative)
  • Water area (percentage) (negative)

Scatterplot for economical factors

Scatterplot for economical factors

  • Population density (positive)
  • Electricity consumption (positive)
  • Elevation (negative)

Scatterplot for residential factors

Scatterplot for residential factors

  • senior rate (positively associated)
  • higher education rate (postively associated)
  • black rate (postively associated)

Verifications (climate)

## Analysis of Variance Table
## 
## Response: sample_climate$daily_solar_radiation
##                                                  Df Sum Sq Mean Sq F value
## as.character(sample_climate$cluster_climate.km)   2 92.734  46.367  769.03
## Residuals                                       297 17.907   0.060        
##                                                    Pr(>F)    
## as.character(sample_climate$cluster_climate.km) < 2.2e-16 ***
## Residuals                                                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Only one factor's ANOVA is shown here, but others are also significantly different.

Verifications (economical)

## Analysis of Variance Table
## 
## Response: sample_eco$electricity_consume_total
##                                          Df     Sum Sq    Mean Sq F value
## as.character(sample_eco$cluster_eco.km)   2 2.5952e+16 1.2976e+16  830017
## Residuals                               297 4.6431e+12 1.5633e+10        
##                                            Pr(>F)    
## as.character(sample_eco$cluster_eco.km) < 2.2e-16 ***
## Residuals                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Only one factor's ANOVA is shown here, but others are also significantly different.

Verifications (residential)

## Analysis of Variance Table
## 
## Response: sample_res$black_rate
##                                          Df  Sum Sq Mean Sq F value
## as.character(sample_res$cluster_res.km)   2 27.5161  13.758  860.36
## Residuals                               297  4.7493   0.016        
##                                            Pr(>F)    
## as.character(sample_res$cluster_res.km) < 2.2e-16 ***
## Residuals                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Only one factor's ANOVA is shown here, but others are also significantly different.

Pick the best among the best

By taking intersection of each of the best clusters, we obtained regions that are suitable for PV farms (nonresidential) and regions that are good for residential PV development.

There are 7653 regions suitable for PV farms and 106 for residentail PV.

Merge small regions into counties

We got the county codes for all the selected counties.

70 counties suitable for PV farms:

 04012 04013 04015 04019 04021 04023 04027 06001 06007 06009 06011
 06013 06017 06019 06021 06025 06029 06031 06033 06037 06039 06041
 06043 06047 06053 06055 06057 06059 06061 06065 06067 06069 06071
 06073 06077 06079 06083 06085 06087 06089 06095 06097 06099 06101
 06103 06105 06107 06109 06111 06113 06115 32003 48033 48103 48107
 48115 48135 48153 48169 48189 48227 48303 48305 48317 48329 48371
 48415 48443 48461 48475

8 counties good for residential PV:

  04013 06037 06073 06079 06095 32003 48303 48375

Visualization on the Map

We calculated how much a county' PV development is above the average level of all selected (best) counties and above the national average level. The color is coded by whether or not the county's PV development is above the best average.

Residential PV

Residential PV

As the map shows, there is only one county above the best average, meaning 7 other ones have great potential to develop residential PV.

PV farm

PV farms

Most selected counties for PV farm are in California and Florida, which matches common sense. The counties with lighter color should be considered building more PV farms.

Acknowledgement

Kereush, D., & Perovych, I. (2017). Determining Criteria For Optimal 
  Site Selection For Solar Power Plants. Geomatics, Landmanagement
  and Landscape, 4, 39-54. doi:10.15576/gll/2017.4.39

Rediske, G., Siluk, J. C., Gastaldo, N. G., Rigo, P. D., & Rosa, C. B. (2018). 
  Determinant factors in site selection for photovoltaic projects: 
  A systematic review. International Journal of Energy Research, 43(5), 
  1689-1701. doi:10.1002/er.4321

Yu, J., Wang, Z., Majumdar, A., & Rajagopal, R. (2018).DeepSolar: 
  A Machine LearningFramework to Efficiently Construct a Solar 
  Deployment Database in the UnitedStates. Joule,2(12), 2605-2617.
  doi:10.1016/j.joule.2018.11.021